On the Effectiveness of Statistical Modeling Based Template Matching Approach for Continuous Speech Recognition
نویسندگان
چکیده
In this work, we validate the effectiveness of our recently proposed integrated template matching and statistical modeling approach on four baseline systems with increasing phone recognition accuracies in the range of 73% to 78% for the TIMIT task. The four baselines were generated using the methods of 1) Discriminative Training (DT) of Minimum Phone Error (MPE), 2) MFCC concatenated with ensemble Multiple Layer Perceptron (MFCC+EMLP) features, 3) DT combined with the MFCC+EMLP features, and 4) data sampling based ensemble acoustic models integrated with DT and MFCC+EMLP features. Experimental results obtained from template matching based rescoring on the phone lattices generated by the baseline models have shown that our template matching approach has produced consistent and significant improvements over the four baselines, and the highest recognition accuracy was 79.55% obtained from rescoring the phone lattices produced by the ensemble acoustic model baseline.
منابع مشابه
Integrate template matching and statistical modeling for speech recognition
We propose a novel approach of integrating template matching with statistical modeling to improve continuous speech recognition. We use multiple Gaussian Mixture Model (GMM) indices to represent each frame of speech templates, use hierarchical agglomerative clustering to generate template representatives, and use log likelihood ratio as the local distance measure for DTW template matching in la...
متن کاملImproved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition
Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...
متن کاملEvaluation of Similarity Measures for Template Matching
Image matching is a critical process in various photogrammetry, computer vision and remote sensing applications such as image registration, 3D model reconstruction, change detection, image fusion, pattern recognition, autonomous navigation, and digital elevation model (DEM) generation and orientation. The primary goal of the image matching process is to establish the correspondence between two ...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کامل